chat : Avoid partial reasoning tags in response content #15149

p1-0tr · 2025-08-07T10:37:44Z

If a model uses a multi-part reasoning tag we can end up with part of the tag in the message content when using streaming mode. E.g.

$ curl -N http://localhost:8080/v1/chat/completions -d '{
  "model": "hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ],
"stream": true
}' -H "Content-Type: application/json"
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<|channel|>"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"analysis"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"The"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

...

This happens because the chat parser can't make a full match on the first parts of the reasoning tag. So, modify try_consume_literal() to speculatively consume a partially matching string in case the parser is constructed with partial set to true.

Make sure to read the contributing guidelines before submitting a PR

If a model uses a multi-part reasoning tag we can end up with part of the tag in the message content when using streaming mode. E.g. $ curl -N http://localhost:8080/v1/chat/completions -d '{ "model": "hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl", "messages": [ {"role": "user", "content": "Hello, how are you?"} ], "stream": true }' -H "Content-Type: application/json" data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<|channel|>"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"analysis"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"The"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} data: {"choices":[{"finish_reason":null,"index":0,"delta":{}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"} ... This happens because the chat parser can't make a full match on the first parts of the reasoning tag. So, modify try_consume_literal() to speculatively consume a partially matching string in case the parser is constructed with partial set to true. Signed-off-by: Piotr Stankiewicz <[email protected]>

p1-0tr · 2025-08-14T14:39:15Z

No longer needed with #15181

github-actions bot added the testing Everything test related label Aug 7, 2025

p1-0tr force-pushed the ps-fix-reasoning-tags-in-content branch from 7e13319 to 4c64211 Compare August 7, 2025 11:09

pwilkin mentioned this pull request Aug 7, 2025

Misc. bug: gpt-oss-20b perplexity broken #15155

Closed

p1-0tr force-pushed the ps-fix-reasoning-tags-in-content branch from 4c64211 to 82bf586 Compare August 11, 2025 07:10

p1-0tr closed this Aug 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

chat : Avoid partial reasoning tags in response content #15149

chat : Avoid partial reasoning tags in response content #15149

Uh oh!

p1-0tr commented Aug 7, 2025 •

edited

Loading

Uh oh!

p1-0tr commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

chat : Avoid partial reasoning tags in response content #15149

chat : Avoid partial reasoning tags in response content #15149

Uh oh!

Conversation

p1-0tr commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p1-0tr commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

p1-0tr commented Aug 7, 2025 •

edited

Loading